Skip to content

Add Engram model structure integration (v1)#3689

Draft
ilml wants to merge 2 commits intoNVIDIA:devfrom
ilml:tolong/engram
Draft

Add Engram model structure integration (v1)#3689
ilml wants to merge 2 commits intoNVIDIA:devfrom
ilml:tolong/engram

Conversation

@ilml
Copy link

@ilml ilml commented Mar 4, 2026

Summary

  • Integrate DeepSeek's Engram n-gram hash embedding module into Megatron-LM as a new model type under megatron/core/models/engram/
  • Extends GPTModel with Engram-augmented transformer layers that inject gated n-gram embeddings before self-attention at configurable layer positions
  • Includes builder, layer specs, and pretrain_engram.py entry point following existing Mcore patterns (similar to Mamba integration)

Components

File Purpose
megatron/core/models/engram/engram_module.py Core: CompressedTokenizer, NgramHashMapping, MultiHeadEmbedding, ShortConv, EngramModule
megatron/core/models/engram/engram_layer.py EngramTransformerLayer — extends TransformerLayer with pre-attention Engram
megatron/core/models/engram/engram_model.py EngramGPTModel — extends GPTModel with hash pre-computation
megatron/core/models/engram/engram_layer_specs.py Layer spec factory for Engram layers
engram_builders.py Model builder
pretrain_engram.py Training entry point with Engram CLI args

Design decisions

  • Extends GPTModel (not LanguageModule) since Engram is fundamentally GPT + extra module in specific layers
  • Two-phase forward: hash IDs pre-computed at model level (CPU/numpy), embeddings cached in each EngramModule, consumed during layer forward — avoids modifying TransformerBlock's interface
  • HC_MULT (hyper-connection) handled internally within EngramModule via expand/collapse — compatible with standard [S, B, H] tensor flow
  • No core Megatron files modified — pure inheritance + ModuleSpec system

Known limitations (v1)

  • HC dimension is local per Engram layer (not persisted across layers)
  • Engram embedding tables are not tensor-parallel-sharded
  • Hash computation runs on CPU (numpy)
  • Replaced sympy dependency with pure-Python primality test

Test plan

  • Verify model construction with --engram-layer-ids argument
  • Single-GPU forward pass sanity check
  • Compare Engram module output shapes against reference implementation

Made with Cursor

Integrate DeepSeek's Engram n-gram hash embedding module into Megatron-LM.
This initial version focuses on model structure only, extending GPTModel
with Engram-augmented transformer layers that inject gated n-gram embeddings
before self-attention at configurable layer positions.

Key components:
- EngramModule: n-gram hashing, multi-head embedding, gated value projection,
  causal short convolution with hyper-connection multiplier
- EngramTransformerLayer: extends TransformerLayer with pre-attention Engram
- EngramGPTModel: extends GPTModel with hash pre-computation from input_ids
- Layer specs, builder, and pretrain entry point following Mcore patterns

Made-with: Cursor
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 4, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant